CIV1498 - Introduction to Data Science

Project - Bike Share Toronto

Part 2 - Exploratory Data Analysis

Table of contents

  1. Usage trends from 2017 to 2020
  2. Behavior of annual and casual members
  3. Analysis of neighbourhoods and bikeshare stations
  4. Patterns in trip duration
  5. Effects of weather
  6. Effects of proximity to TTC subway stations
  7. Impact of COVID

Setup Notebook

Load data

1 Usage trends from 2017 to 2020

1.1 Generate monthly rides

1.2 Plot monthly rides

2 Behavior of annual and casual members

2.1 Analysis of Daily rides

2.1.1 Generate daily rides

We also care about daily rides of special days including statutory holidays and "Free Ride Wednesday". The dates of the special days are specified below:

Statutory holidays: New Year's Day, Family Day, Good Friday, Victoria Day, Canada Day, Civic Holiday, Labour Day, Thanksgiving, Christmas Day, Boxing Day

For some cases, the specific date of a holiday is adjusted to Monday of the long weekend.

Dates of Free Ride Wednesday

Source: 2017, 2018, 2019, 2020

Since there are very few casual member usage between November and April, daily data between May and October will be used for this section.

2.1.2 Distribution of daily rides

2.1.3 Daily rides of different types of days

The outlier at the lower left corner (red point) is on 2019-09-10.

2.2 Analysis of hourly data

2.2.1 Generate hourly data

2.2.2 Plot hourly data

3 Analysis of neighbourhoods and bikeshare stations

3.1 Import geographic information

3.1.1 Neighbourhoods geometry

3.1.2 Bikeshare station geometry

3.2 Generate count of rides for neighbourhoods and stations

This subsection will count the total rides for each bikeshare station. Rides that start or end with a station are considered separately. For each neighbourhoods, the summation of rides among stations within the neighbourhood's boundary are also generated.

3.2.1 Rides for stations

Now present top departing and arriving stations with the largest number of rides.

3.2.2 Rides for neighbourhoods

Now present top departing and arriving neighbourhoods with the largest number of rides.

3.3 Results on interactive maps

The first map shows the neighbourhoods boundary and the locations of bike stations.

The second map is a choropleth map of station density for each neighbourhood

The third map is a choropleth map of departing rides for each neighbourhood

The fourth map is a choropleth map of arriving rides for each neighbourhood

4 Patterns in trip duration

4.1 Distribution of trip duration for annual and casual members

Casual members tend to ride longer in each trip compared to annual members. That is because people who use shared bike for daily commute prefer to have a annual membership and trips for daily commuting are generally shorter.

4.2 Duration for each weekday

4.2.1 Prepare duration data

4.2.2 Duration for each day of the week

4.3 Mean duration for each month

4.3.1 Prepare duration data

4.3.2 Plot trend of mean duration

4.4 Geographic pattern in duration

4.4.1 Generate mean duration for each neighbourhood

4.4.2 Choropleth map of mean duration

Trip duration in downtown area is shorter than that of the suburbs. This pattern is reasonable because there are a large amount of short rides for daily commuting in downtown.

5 Effects of weather

5.1 Generate daily weather data

The values used under each weather feature were determined using the mean. Using maximum, minimum, or mean should only affect the magnitude of the values but not the general trend for most features. An additional column called 'Precipitation' was added because the 'non-clear' condition in the 'Weather_simple' column includes non-precipitating conditions; therefore, the new column will have strictly precipitating conditions to investigate any differences between the two columns.

5.2 Plots of daily rides versus weather features

5.2.1 Weather condition and precipitation

The figures above show that, in general, more people ride when there are not any forms of precipitation or other weather elements that reduce visibility. However, it does appear that regardless of the weather, there will always be a minimum number of rides each day, which could be interpreted as people that do not care about the weather condition. Additionally, annual members are more likely to continue riding bikes than casual members in subpar weather conditions. Some of the plots exhibit a "stretching" effect toward the higher values. This is caused by the presence of some mild outliers.

5.2.2 Temperature

There is an overall positive trend where the number of rides increases with temperature. This is expected as warmer weather favour outdoor activities especially for the casual members as can be seen in the dramatic increase in ridership compared to the annual members.

5.2.3 Dew Temperature

Dew temperature is dependent on air temperature and humidity (https://climate.weather.gc.ca/glossary_e.html#), so the result is near identical to that of the air temperature figures above.

5.2.4 Relative Humidity

Overall there does not seem to be any appreciable effects.

5.2.5 Wind Speed

As wind speed increases, the number of riders decrease quite noticeably. This can be attributed to the increase difficulty of riding against the wind especially when facing the wind tunnel effect caused by the tall and dense buildings in the downtown area as well as the effect of wind chill in the winter.

5.2.6 Visibility

There is some indication that as visibility increases, the number of rides increases. Intuitively, this makes sense as poor visibility is usually attributed to poor weather conditions such as precipitation or fog.

5.2.7 Humidex

Interestingly, humidex values do not seem to affect ridership greatly. Perhaps when the weather is sufficiently warm (above 20 degrees celsius air temperature), the number of riders do not change appreciably. This phenomenon can be seen in the temperature graph above 20 degrees celsius.

5.2.8 Wind Chill

The increase in rides with increasing wind chill makes sense since a lower wind chill corresponds to colder weather. The change in casual members appear to be minimal but given the general low number of casual riders in the winter months, any change is quite distinguishable. For example, it is clear that below around -12 wind chill there are almost no casual riders compared to when the wind chill is closer to zero.

5.2.9 Wind Direction

The plot shows that there is no effect on ridership caused by wind direction. In reality, the wind direction observed at the weather station could be significantly different than the conditions experienced in the city due to change in location and elevation, wind tunnels, etc.

5.3 Plots of daily average trip duration versus weather features

5.3.1 Weather condition and precipitation

The average trip duration also decreases with different weather conditions, but the change is more subtle compared to the number of rides. This could be due to the fact that those people who ride are already committed to the trip, meaning the trip will not likely be cut short due to weather.

5.3.2 Temperature

5.3.3 Dew Temperature

5.3.4 Relative Humidity

5.3.5 Wind Speed

5.3.6 Visibility

5.3.7 Humidex

5.3.8 Wind Chill

5.3.9 Wind Direction

The general trend of different weather features on trip duration is the same as ride number.

Overall, it is clear that air temperature, dew temperature, wind speed, visibility, observable weather condition, and wind chill affect the number of rides and trip duration the most. Wind chill is derived from temperature and wind speed; visibility correlates with observable weather condition (visibility decreases with fog, rain, snow, etc.); dew temperature is also derived from temperature; therefore, the ultimate determining factors are temperature, wind speed, and weather condition.

6 Effects of proximity to TTC subway stations

6.1 Prepare geographic data

Legend:

Blue: subway stations

Red: bike stations within buffer zone

Black: bike stations outside buffer zone

6.2 Effects of subway access on usage

6.2.1 Effects on number of rides

6.2.2 Effects on trip duration

Bikeshare stations that are close to subway stations tend to have higher demands and shorter ride durations. There is no obvious difference between departing and arriving rides from the plots above. People who use shared bikes as an extension of subways for commuting also tend to ride to get access to subways.

7 Impact of COVID

On March 17, 2020, Ontario Premier Ford declared a provincial state of emergency, and the Ontario government extended the state of emergency through April 13, 2020. Let's investigate the impact of the city closure.

7.1 Create filters for the study period for 4 years

7.2 Aggregate the rides data in the level of days

From the plot above, it can be seen that the daily rides data for 2020 do not have too many differences compared to 2017 and 2018, while the number of daily rides in these three years all vary between 1000 to 3000. However, the daily rides in 2020 in the specific study period faced a huge decrease compared to 2019. In 2019, the highest daily rides during the study period happened on the 29th day from March 17, and it reached above 5000 rides per day, while the highest daily rides for 2020 was 3500 rides per day.

Based on the average daily rides during the study period, the state of emergency did affect daily rides.

Let's investigate the effect of Covid on different types of members.

7.3 Effect of Covid on different types of members

Comparing to the 3 other years, 2020 has an obvious loss in terms of daily rides by annual members, while the casual members were not affected as much. Due to the pandemic, many workers work from home, so they are not using the bikes as often, or they may have even cancelled their membership.

From the plot above, it can be found that the average duration during the lockdown hugely surpassed those average duration times from previous years. Combining with the scatter plot above, some casual members may have still chosen to go bike riding during the lockdown; thereby, increasing the average duration.

What about after lockdown?

After lockdown, the daily rides of 2020 started to increase to the normal level, which is generally higher than the daily rides of 2019.

The number of rides for annual and casual memberships have increased from the March 17 to June 13 period compared to the March 17 to April 13 period.